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Abstract. Methods of approximate Bayesian computation (ABC) are increas¬ 
ingly used for analysis of complex models. A major challenge for ABC is over¬ 
coming the often inherent problem of high rejection rates in the accept/reject 
methods based on prior:predictive sampling. A number of recent developments aim 
to address this with extensions based on sequential Monte Carlo (SMC) strategies. 
We build on this here, introducing an ABC SMC method that uses data-based 
adaptive weights. This easily implemented and computationally trivial extension 
of ABC SMC can very substantially improve acceptance rates, as is demonstrated 
in a series of examples with simulated and real data sets, including a currently 
topical example from dynamic modelling in systems biology applications. 

Keywords: complex modelling, adaptive simulation, dynamic bionetwork models, 
importance sampling, mixture model emulators. 


1 Introduction 

Methods of approximate Bayesian computation (ABC) are becoming increasing ex¬ 
ploited, especially for problems in which the likelihood function is analytically in¬ 
tractable or very expensive to compute (Pritchard et ah, 1999; Marjoram et ah, 2003). 
Recent applications run across areas such as evolutionary genetics (Beaumont et ah, 
2002), epidemiology (McKinley et al., 2009), astronomical model analysis (Cameron 
and Pettitt, 2012), among others (Csillery et al., 2010). 

Vanilla ABC simulates parameter/data pairs ( 9 , x) from the prior distribution whose 
density is p(9)p(x\9), accepting 6 as an approximate posterior draw if its companion 
data x is “close enough” to the observed data x 0 f, s . If p(x, x 0 t, a ) is the chosen measure of 
discrepancy, and e is a discrepancy threshold defining “close”, then accepted parameters 
are a sample from p(9\p(x, x 0 b s ) < e). Often, it is not possible or efficient to use the 
complete data set. In that case, a reduced dimensional set of summary statistics S can 
be used so that the accepted draws sample from p(9\p(S, S 0 b s ) < e). For simplicity of 
notation, the posterior will be denoted by p(9\p(x, x 0 b s ) < e), with the understanding 
that the effective data x potentially represents such a summary of the original data. 

A main issue is high rejection rates that result from: (i) the requirement that e be 
small to build faith that the approximate posterior is a good approximation to p(9\x 0 b s ), 
and (ii) the posterior p(9\x 0 b s ) may be concentrated in completely different regions of 
parameter space than the prior. To address this, modifications of vanilla ABC are emerg¬ 
ing, including regression adjustment strategies (e.g., Beaumont et al., 2002; Blum and 
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Frangois, 2010; Bonassi et al., 2011), and automatic sampling schemes (Marjoram et al., 
2003; Sisson et al., 2007) utilizing techniques such as sequential Monte Carlo (SMC). 
In the latter ABC methods, SMC is used in order to automatically, sequentially refine 
posterior approximations to be used to generate proposals for further steps. At each 
of a series of sequential steps indexed by t, these methods aim to generate draws from 
p(9\p(x, Xobs) < et) where et define a series of decreasing thresholds. Total acceptance 
rates can be significantly higher than with vanilla ABC scheme (Toni et al., 2009; Toni 
and Stumpf, 2010). 

The original version of the ABC SMC algorithm proposed in Sisson et al. (2007) 
was motivated by the SMC samplers methodology of Del Moral et al. (2006). Later, 
Beaumont et al. (2009) realized that this original method can result in biased samples 
relative to the true posterior, and this was followed by development of a corrected 
approach (Beaumont et al., 2009; Sisson et al., 2009; Toni et al., 2009). The general 
form of the algorithm, which relies fundamentally on sequential importance sampling, 
is shown in Figure 1. 

Applications of this ABC SMC algorithm have been presented in a variety of ar¬ 
eas including population genetics (Beaumont et al., 2009), systems biology (Toni et al., 
2009; Toni and Stumpf, 2010; Liepe et al., 2010) and psychology (Turner and Van Zandt, 
2012). In terms of methodology development, there is increasing interest in extensions 
and improvements of this algorithm, as demonstrated in, for example, Lenormand et al. 
(2013); Filippi et al. (2013); Silk et al. (2013). Building on this momentum and open 
challenges to improving the methodology, our current focus is on the form of the ABC 
SMC of Figure 1. We note and comment on the ABC SMC approach of Del Moral 
et al. (2011); this makes use of an adaptive threshold schedule, extends SMC sam¬ 
plers (Del Moral et al., 2006) and uses an Markov chain Monte Carlo (MCMC) kernel 
for propagation of particles. As a result, the computation of weights has linear complex¬ 
ity as a function of the number of particles, while a disadvantage of this approach is that 
it can result in particle duplications, possibly leading to inferior overall performance (as 
empirically shown in Lenormand et al., 2013) in comparison to the basic algorithm of 
Figure 1 that we begin with here. 

One aspect of the algorithm of Figure 1 is that the computational demand in evalu¬ 
ating weights increases quadratically as a function of the number of particles. While this 
appears disadvantageous, in practice it is often just not a limiting factor. The reason for 
this is that in practical ABC applications, computation time is typically substantially 
dominated by the repeated data simulation steps; this is borne out in our own expe¬ 
riences discussed further below, and has been clearly indicated and discussed in other 
studies, including Beaumont et al. (2009); Filippi et al. (2013); Del Moral et al. (2011), 
for example. 


2 ABC SMC with Adaptive Weights 

The ABC SMC strategy above is intimately related to adaptive importance sampling 
(Liu, 2001; Cornuet et al., 2012) using adaptively refined posterior approximations based 
on kernel mixtures as importance samplers. Originating as a method for direct posterior 
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1. Initialize threshold schedule ei > • • • > ct 

2. Set t = 1 

For i = 1,..., N 

- Simulate 9^ ~ p(9) and x ~ p{x\9 until p{x, x Q i , s ) < 

- Set Wi = 1/N 

3. For t = 2,..., T 

For i = 1,... ,N 

- Repeat: 

Pick 0* from the 9^ 1 - > ’s with probabilities w^ _1 \ 

draw 9^ ~ K t (9^\9*) and x ~ p(x\9^); 
until p(x,x obs ) < e t 

- Compute new weights as 


(i) 

ur oc 


P(0; 


(th 




Normalize over i = 1,..., N 


Figure 1: ABC SMC algorithm (Beaumont et ah, 2009; Sisson et ah, 2009; Toni et ah, 
2009). Here A' t (-|-) is a conditional density that serves as a transition kernel to “move” 
sampled parameters and then appropriately weight accepted values. In contexts of real¬ 
valued parameters, for example, Kt(9\9*) might be taken as a multivariate normal or t 
density centred at or near 9*, and whose scales may decrease as t increases. 


approximation (West, 1992, 1993b) in complex models, that approach defines kernel 
density representations of a “current” posterior approximation gt(9) = Y2j w jKt{0\9j) 
as an importance sampler for a next step t- 1-1, then adaptively updates the parameters 
defining I\ t (- 1-) as well as importance weights (West, 1993a; Liu and West, 2001). 

This historical connection motivates an extension of ABC SMC that is the focus of 
this paper. That is, simply apply the idea of kernel density representation to the joint 
distribution of accepted values (x, 9) using a joint kernel K t (x, 9\x*,9*). For this paper, 
we use a product kernel K t (x, 9\x*, 9*) = K Xt t{x\x*)Kg tt {9\9*), which leads to major 
benefits in terms of computational convenience. The underlying idea that we are working 
with kernel density approximations to the joint distribution of (x, 9) means that we can 
rely on the utility of product kernel mixtures generally (e.g., Fryer, 1977; Scott and Sain, 
2005). A joint approximation g t (9,x) oc Y2j w j-^x,t(x\xj)Kg tt (6\6j) yields a marginal 
mixture for each of x and 9 separately, and a posterior approximation (emulator or 
importance sampler) with density gt(9\x 0 b s ) oc Y2j w jKx,t(x 0 bs\xj)Kg tt (9\9j). In our 
proposed extension of ABC SMC, this form is used to propose new values, and we can 
immediately see how proximity of any one Xi to the observed data x 0 b s will now help 
raise the importance of proposals drawn from or near to the partner particle 9i. As for 
Kg t t{-\9), multivariate normal or t density are natural choices for K x ^{-\x). For a more 
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1. Initialize threshold schedule ei > ■ ■ ■ > ct 

2. Set t = 1 

For i = 1,... ,N 

- Simulate 6f' > ~ p(0) and x ~ p(x\9^) until p(x, x Q i , s ) < 

- Set Wi = 1/N 

3. t = 2,... ,T 

Compute data based weights vf ^ oc wf ^K x ,t{%obs 1^ 1 ' ) ) 
Normalize weights vf 1 ' ) over i = 1,... ,N 


For i = 1,..., N 

- Repeat: 

Pick 0* from the of 1 ^’s with probabilities vf -1 ' 1 

draw df'* ~ Ke tt (Gf^\0*) and x 
until p(x,x obs ) < e t 

- Compute new weights as 


(t) 

Wi oc 


p(s, 


(t)s 


E 


Relief ~ l) ) 


3 3 


Normalize 


,(*) 


over i = 1,..., N 


Figure 2: ABC SMC with Adaptive Weights. 


detailed discussion of kernel choices, see Silverman (1986) and Scott (1992). 

Figure 2 shows the algorithmic description of this new ABC SMC with Adaptive 
Weights (ABC SMC AW). The inclusion of a new step where the weights are modified 
according to the respective values of x adds computations. However, since computational 
time in the ABC SMC algorithm is usually dominated by the extensive repetition of 
model simulations, the increased compute burden will often be negligible. Note also that 
the original ABC SMC is a particular case when K x>t (-\x) is uniform over the region of 
accepted values of x. 

The idea of approximating the joint distribution of parameters and data together 
is also present in Bonassi et al. (2011), where mixture modeling is applied to the joint 
distribution and then used as a form of nonlinear regression adjustment. The smoothing 
step in ABC SMC AW can be seen as an automatic simplified version of that approach, 
about which we say more in the concluding Section 6. 


3 Theoretical Aspects 

We discuss some of the structure of ABC SMC AW to provide insight as to why it can 
be expected to lead to improved acceptance rates. 
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First, note that the final output of ABC SMC methods is an estimate of the target 
distribution p(9\d(x, x 0 b s ) < e-r). This provides the practitioner with the convenient op¬ 
tion of using the output as a final approximation or as an input to be refined by using 
a preferred regression adjustment technique, such as local linear regression (Beaumont 
et ah, 2002) or mixture modeling (Bonassi et ah, 2011). However, in the context of inter¬ 
mediate ABC SMC steps, more refined approximations have the potential to improve 
efficiency; this can be automatically achieved by use of kernel smoothing techniques. 
At each intermediate step t, this motivates the following approximation for the joint 
density p(x, 9), locally around x = x 0 b s - 

p t {x,9\A) = j J p(9\A)p(x\9,A)Kg(9\9)K x (x\x)dxd9 (1) 

where A = {x : d(x,x 0 b s ) < et} represents the acceptance event at step t. Kg and K x 
are the kernel functions as described in Section 2, and approximate draws of p{9,x\A) 
are obtained via importance sampling. 

This defines an implicit posterior emulator, namely 

Pt(9\x 0 bs) oc J J p(9\A)p(x\9,A)Kg(9\9)K x {x 0 bs\x)dxd9. (2) 

We now explore equation (2) to ensure that: (A) it defines a valid importance sampler 
with target p(9\p(x, x 0 b s ) < et) at each step t, and (B) the resulting expected acceptance 
rates are higher than for regular ABC SMC. We address these two points in turn. 


(A): At step t, note that 9 is sampled from the distribution having density g{9) = 
Vj t ~ 1 ^t,e(9\9 < ' J t ~ 1 ^). Given this value, the extra step then simulates x until the event 
A is true. This event has probability pr(A\ 9), where pr(A\ 9) = J A p(x, 9)dx, resulting 
in the overall proposal 


g(9\A) oc g{9)pr{A\ 9). 


(3) 


Finally, to sample from p(9\A) = p{9\d[x, x 0 b s ) < et) based on this proposal, the impor¬ 
tance sampling weights are 


w{9) 


p(9)pr(A\ 9) 
g{9)pr{A\ 9) 


p(Q) 

m' 


(4) 


which have exactly the same form as in Figure 2. 


(B): At step t, equation (1) describes the approximation for the joint density of 
(x,9) locally around x = x 0 bs- Based on this representation, it is possible to show 
that the proposal distribution implicitly defined in ABC SMC AW results in higher 
prior predictive density over the acceptance region for the next SMC step. This is seen 
as follows. For simplicity of notation, use po(x,9) to refer to pt(x,9) at the current 
step t. The proposal density in ABC SMC is po{9) 1 whereas that for ABC SMC AW 
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is po(0\x 0 bs)- These two densities induce marginal prior predictive densities po{x) and 
p i(x), respectively. Integration of a prior predictive density over the acceptance region 
A t+ 1 = {x : p(x,x 0 bs) < Ci+i} yields the corresponding acceptance probability for the 
next SMC step, namely pr(A t + 1 ). We now show that pri(A t +\) > pr 0 (A t + 1 ) so that 
ABC SMC AW improves acceptance rates over regular ABC SMC. Our proof relies on 
the assumption that acceptance probability pro(A t+ i| 0 ) is positively correlated with 
_Po(A’obs|©) with respect to 9 ~ Po(9). This is a reasonable assumption to make in regular- 
cases, including the limiting case that et+i —t 0 when the correlation tends to 1 . 

Under ABC SMC we have 

pr 0 (A t+1 ) = J pr 0 (A t+1 \9)p 0 (9)d9 = E(pr 0 (A t+1 \Q)), 

where the expectation is with respect to Po{-)- The corresponding value under ABC 
SMC AW is 

pr 1 (A t+1 ) = J pr 0 (A t+1 \9)p 0 (9\x o bs)d9 

,, , a ^Po{x 0 bs\9)p 0 (9) E(pr 0 (A t+ i\Q)p 0 (x obs \Q)) 

pr 0 {At +1 \9) - - -r- d9 = - - -t- 

P0\Xobs) PO^Xobs) 

E(pr 0 (A t+1 \<d))E{po{x obs \0)) 

> - 7 -T- =pr 0 {A t+1 ), 

PO \Xobs ) 

assuming Cov(pr 0 (A t+1 \@)p 0 (x o bs\@)) > 0 . 



4 Illustrative Example: Normal Mixtures 

4.1 A Standard Example 

A simple example taken from previous studies of ABC SMC (Sisson et ah, 2007; 
Beaumont et ah, 2009) concerns scalar data x\9 ~ 0.5Af(9, 1) + 0.5J\f(9, 0.01) and 
prior 9 ~ U{— 10,10). With observed value x 0 b s = 0, the target posterior is 9\x obs ~ 
0.5A7(0,1) + 0.5A/"(0,1/100) truncated to (-10,10). 

We follow details in Sisson et al. (2007) with discrepancy measure p(x,x 0 b s ) = 
\x — x ob s | and threshold schedule ei : 3 = (2, 0.5, 0.025), and we use normal kernels Kg >t 
and K x t with standard deviations (or bandwidth parameters) hg and h x , respectively. 
Following standard recommendations in West (1993b) and Scott and Sain (2005), the 
bandwidths hk, for k £ {s, 0 }, are set at hk = dk/N 1 / 6 where &k is the standard 
deviation, which is computed based on the values of the particles and their respective 
weights. This standard rule-of-thumb specification is an asymptotic approximation to 
the optimal bandwidth choice based on the mean integrated squared error of the product 
kernel density estimate (Scott, 1992). 

We ran ABC SMC and ABC SMC AW to obtain samples of N =5,000 particles in 
each case. Table 1 shows the average number of simulation steps per accepted particle; 
the adaptive weight modification requires about 30% fewer simulations. As shown in 
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t 

e t 

ABC SMC 

ABC SMC AW 

1 

2 

5.01 

4.96 

2 

0.5 

4.33 

2.38 

3 

0.025 

39.71 

27.22 


Total 

49.05 

34.56 


Table 1: Normal mixture example: Average number of simulation steps per accepted 
particle for study with TV = 5, 000 particles. 



e 


Figure 3: Normal mixture example: Approximate (lines) and exact (shaded) posterior 
densities. 
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Figure 4: Monte Carlo estimates for the normal mixture example: Box-plots of estimated 
posterior moments (as annotated) based on 50 repeated runs of SMC ABC and SMC 


ABC AW. 


Figure 3, the resulting posterior approximations are quite accurate. More importantly 
from the viewpoint of the methodology here, the two strategies give very closely similar 
results, with the AW variant substantially improving the computational efficiency in 
terms of acceptance rates. The suggested equivalence of accuracy of the methods is also 
confirmed in Figure 4, which displays a comparison of Monte Carlo estimates based on 
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50 repeated runs of both methods using N =1,000 particles. The plots in the figure 
do not indicate any significant difference in the distribution of Monte Carlo estimates 
produced by both methods. In terms of coefficient of variation of the final importance 
weights, the outputs of both methods were also similar, with average coefficient of 1.34 
(with standard deviation 0.46) for ABC SMC AW, and average coefficient of 1.13 (with 
standard deviation 0.37) for ABC SMC. 

4.2 Multivariate Mixture Examples 

To more aggressively explore performance, we have run studies on multivariate versions 
of the normal mixture example. Generally, take p —dimensional data x|0 ~ 0.5A f p {0, I p )+ 
0.5J\T p (0, 0.01/p), where x = (xi, ..., x p )' and Q = (#i,..., 0 p )'. The prior has 6 uni¬ 
formly distributed on [— 10 , 10 ] p . 

For the method implementation we use normal kernels Kg t and K xt with diag¬ 
onal variance matrices and with scalar bandwidths defined via the standard rule-of- 
thumb (Silverman, 1986; Scott and Sain, 2005) as used in the example of the previous 
section. That is, for each scalar dimension k of parameters and data, if dk denotes the 
standard deviation in that dimension, then hk = d'kN~ 1 ^ d+i ' > where N is the number 
of particles and d is the total dimension (parameters and data). The discrepancy used 
is p(x,X 0 bs) = J2k( X k — Xk,obs) 2 - 

Given the observation x 0 f, s = (0,..., 0)', we ran ABC SMC and ABC SMC AW to 
obtain samples of N =5,000 particles for different cases of dimension p. In each case, 
the threshold schedule was defined to keep correspondence with the schedule used in the 
example of Section 4.1. This correspondence was achieved by selecting threshold values 
that result in the same percentiles of the discrepancy distribution of prior:model simula¬ 
tions. The comparison results for repeat simulations for all cases with p £ {1,2,..., 50} 
show substantial reduction in the number of required model simulations induced by 
the adaptive weight modification. Across this range of dimensions, the reductions seen 
run from about 30 — 70% (as seen in Table 2), confirming that the adaptive weighting 
strategy can be effective in higher-dimensional problems, even with p as high as 50 or 
more. 


dimension p 

1 

2 

5 

10 

15 

20 

25 

30 

40 

50 

ABC SMC 

40.3 

34.1 

35.5 

33.4 

33.6 

31.3 

31.4 

26.9 

21.4 

18.8 

ABC SMC AW 

29.7 

14.4 

10.2 

11.4 

12.8 

13.1 

13.9 

12.2 

11.4 

11.4 


Table 2: Multivariate normal mixture example: Average total number of simulation 
steps per accepted particle. 


5 Comparison in Applications 

Two studies demonstrate the improvements achievable using adaptive weights in a cou¬ 
ple of interesting applied contexts, one using real data and the other synthetic data 
(so that “truth” is known). In each we use the same algorithm setup as described in 
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Section 4.2. For the threshold schedule, we follow Beaumont et al. (2002) with a pilot 
study of prior:model simulations to identify et as a very low percentile of the distribu¬ 
tion of the discrepancies, and then define a schedule e\-t to gradually reduce to that 
level. See also Bonassi et al. (2011) for more insight into this approach and variants. 
Specific values of the et are noted in the following sections. 

5.1 Toggle Switch Model in Dynamic Bionetworks 

A real data example from systems biology comes from studies of dynamic cellular net¬ 
works based on measures of expression of genes (network nodes) at one or more “snap¬ 
shots” in time. The model, context and flow cytometry data set come from Bonassi 
et al. (2011). This is data from experiments on bacterial cells using an engineered gene 
circuit with design related to the toggle switch model (Gardner et al., 2000). The model 
describes the dynamical behavior of a network with two genes (u and v) with doubly 
repressive interactions. In discrete time the specific model form is 

^ c,t-\-h ^c,t T hcy u /(1 T Vc,t^ u ') h(l -f- 0.03u Cj f) T h0.5^c,u,£, /r\ 

v c ,t+h = v c j + ha v /(l + u c ^ v ) - h(l + 0.03i> Cjt ) + /i0.5£ Cj „ )t , 

over time t , where c indexes bacterial cells, h is a small time step, the are indepen¬ 
dent standard normals, and for some specified initial values it Ci o,'c c ,o- 

The observable data set is a sample from the marginal distribution of levels of just 
one of the two genes at one point in time, viz. y = {y c ,c = 1 : 2 , 000 }, where y c is a 
noisy measurement of u T at a given time point r. The measurement error process is, 
cell-by-cell, given by 

Vc = u C}T + y + yay c /u2 tT1 c = 1 ,..., 2 , 000 , ( 6 ) 

where the rj c are standard normals, independent over cells and independent of the 

stochastic terms £ Vj t in the state evolution model of equation (5). The study of Bonassi 
et al. (2011) uses h = 1, r = 300 and initial state u Ct o = v Ci o = 10, adopted here. The 
full set of 7 model parameters is 9 = (a u ,a v ; (3 U ,(3 V ', /r, a, 7 ). 

The data is shown in Figure 5. Following Bonassi et al. (2011), the sample data set 
is reduced to a set of summary “reference signatures” that define the effective data x 
we aim to condition on. The details of the dimension reduction from the full, original 
sample y to x are not of primary interest here, though they are of course critically 
important for the application area; readers interested in the specifics of the applied 
context and the precise definition of x can consult Bonassi et al. (2011). Here we simply 
start with the reduced, 11 -dimensional data summary x, giving supplementary code and 
data that provide the relevant details and precise definition of x. We do note that, in 
addition to dimension reduction, the specific definition of x in Bonassi et al. (2011) has 
the advantage of producing an orthogonal projection of transformed raw data so that 
the sample elements of x are uncorrelated, which makes our use of diagonal variance 
matrices in the kernels K x t particularly apt. 

The prior for 9 is comprised of independent uniforms on real-valued transformed 
parameters, defined by finite ranges for each based on substantive biochemical back¬ 
ground information. The priors are also taken from the prior study in Bonassi et al. 
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Figure 5: Toggle switch study: Data are shown in the lower right frame (sample size 
C =2,000 cells). The other frames represent approximate posterior marginal densities 
for each of the 7 toggle switch model parameters, as annotated, to compare results from 
ABC SMC (red) and ABC SMC AW (blue). 


(2011) and are indicated by the ranges of the plots in Figure 5. We summarize analyses 
with N =2,000 particles and £ 1,5 = (2500, 750, 250,150, 75) where the final tolerance 
level £t = 75 corresponds to an approximate 10 ~ 4 quantile of simulated prior discrep¬ 
ancies. Table 3 shows the average number of prior:data generation steps per acceptance 
for both ABC SMC and ABC SMC AW. 

Evidently, SMC ABC AW leads to a reduction of roughly 30% simulation steps, a 
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t 

£i 

ABC SMC 

ABC SMC AW 

1 

2500 

12.3 

12.2 

2 

750 

16.1 

10.6 

3 

250 

24.3 

20.6 

4 

150 

42.1 

28.2 

5 

75 

96.4 

61.8 


Total 

191.3 

133.4 


Table 3: Toggle switch study: Average number of simulations per accepted particle for 
study with N = 2, 000 particles. 

practically very material gain in efficiency. Combined with this, Figure 5 shows that 
the approximate posteriors from ABC SMC AW are practically the same as those from 
ABC SMC, the differences being easily attributable to Monte Carlo variation. 

The toggle switch example is also used to illustrate some aspects of computation 
time of ABC SMC in a real application. We implemented the methods discussed by 
using a PC Intel 3.33 GHz. The total processing time for the ABC SMC algorithm 
was 70,543 seconds, while for the version with adaptive weights the time was 49,332 
seconds. From that comparison we can observe the same gain in efficiency as observed 
in the number of required simulation steps, confirming the point that the computation 
of the adaptive weights adds negligible processing time to the ABC SMC algorithm. 

We also implemented the method of Del Moral et al. (2011) to illustrate the per¬ 
formance difference that is induced by the computation of the weights based on linear 
complexity in the number of particles. This implementation was based on the same set¬ 
tings discussed before, i.e., with the same number of particles, threshold schedule and 
normal kernel. The remaining element to be defined in that method is the number M 
of synthetic data sets generated for each particle, and here we use M = 31 so that the 
method relies on roughly the same overall number of simulations steps that were used in 
the ABC SMC implementation discussed before (without adaptive weights). Given this 
setup, the implementation of the method of Del Moral et al. (2011) was only 0.8% faster 
than the previous ABC SMC implementation. This result confirms the point discussed 
in Section 1, that the linear complexity in the computation of the weights does not nec¬ 
essarily result in significant gain in computation time for practical applications of ABC 
SMC. Lenormand et al. (2013) present a more complete comparison of these methods 
under different scenarios, which suggests that the algorithm with linear complexity in 
the computation of the weights does not necessarily result in better overall computa¬ 
tional efficiency. That study also takes into account the accuracy of the methods in the 
overall comparison. 


5.2 Queuing System 

The second study is of a queuing system previously discussed in Heggland and Frigessi 
(2004) and addressed using ABC by Blum and Frangois (2010) and Fearnhead and 
Prangle (2012). The context is a single server, first-come-first-serve queue (M/G/l). 
Three parameters 6 = [ 6 \. 62 , 9 3 )' determine the distributions of service and inter-arrival 
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0 , 


Figure 6: Queuing system analysis: Summaries of analyses of simulated data set where 
the 3 frames represent margins for each of the model parameters as annotated, showing 
results from ABC SMC (red) and ABC SMC AW (blue). The true known parameter 
values underlying the synthetic data are marked as red triangles on the horizontal axes. 


times; service times are uniform on [$i, $ 2 ] while inter-arrival times are exponential with 
rate 63 . In the notation of Heggland and Frigessi (2004), W r is the inter-arrival time 
of the rth customer and U r the corresponding service time. The inter-departure time 
process {Y r , r = 1, 2,... } is then 


Y r = 


U r , 

Ur + T^lWi-E^Yi 


if £I=i Wi < £1=1 Yi, 
if £[=1 Wi > J 2 r iZi Y i- 


(7) 


Inter-arrival times are unobserved; the observed data from R customers is 
x = {Yi,..., Yr} where R is the number of customers. 

We generated a synthetic data set following the specification in Blum and Frangois 
(2010); here R = 50 with “true” parameters 6 = (1,5,0.2)'. Summary statistics are 
taken as 3 equidistant quantiles together with the minimum and maximum values of 
the inter-departure times. The prior has (6h, 82 — 81 , 83 ) uniformly distributed on [0,10] 3 . 

Analysis used N =1,000 particles and discrepancy schedule ei :5 = (200,100,10,2,1). 
The final level ct = 1 corresponds to a value close to the 10~ 4 quantile of simulated 








F. V. Bonassi and M. West 


183 





ft, 


H 0 


§1 28 

27 

26 


6 


ABC SMC ABC SMC AW 


ABC SMC ABC SMC AW 






S § 


ABC SMC ABC SMC AW 



Figure 7: Monte Carlo estimates for the queueing system example: Box-plots of esti¬ 
mated posterior moments (as annotated) based on 50 repeated runs of SMC ABC and 


SMC ABC AW. 


prior discrepancies. Figure 6 shows close agreement between the estimated marginal 
posterior densities under ABC SMC and ABC SMC AW, and, incidentally, that they 
support regions containing the true parameters underlying this synthetic data set. Fig¬ 
ure 7 displays the comparison of Monte Carlo estimates based on 50 repeat runs of 
the two methods using N =1,000 particles. The plots in the figure do not suggest any 
significant difference in the distribution of Monte Carlo estimates that result. This sim¬ 
ilarity between the methods was observed for the coefficients of variation of the final 
importance weights. An average coefficient of 1.31 (with standard deviation 0.73) was 
observed for ABC SMC AW, and average coefficient of 0.92 (with standard deviation 
0.65) for ABC SMC. 

In order to study the gain in efficiency derived from the use of adaptive weights, 
we repeated this analysis 100 times, each repeat involving a new prior:data simulation. 
Table 4 shows summaries of the numbers of data generation steps necessary for the ABC 
SMC and ABC SMC AW algorithms. We see that we realized an average reduction of 
about 58% in the number of simulation steps by using adaptive weighting. 
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t 

e t 

min 

mean 

max 

min 

mean 

max 

1 

200 

1.0 

1.3 

3.1 

1.0 

1.3 

3.0 

2 

100 

1.2 

1.4 

2.1 

1.0 

1.0 

1.4 

3 

10 

3.3 

12.7 

760.5 

1.2 

2.4 

60.6 

4 

2 

2.4 

9.0 

134.1 

1.4 

3.9 

39.9 

5 

1 

1.9 

6.9 

107.5 

1.1 

4.5 

80.7 


Total 


31.3 



13.1 



Table 4: Queuing system analysis: Summaries of 100 replicate synthetic data analyses, 
showing average numbers of simulation steps per accepted particle. 


6 Additional Discussion 

This paper has introduced ABC SMC AW, shown that it is theoretically expected to 
improve the effectiveness of ABC SMC based on adaptive, data-based weights, and 
demonstrated some of the practical potential in two interesting model contexts from 
related literature. The new approach is simple to implement, requiring only a minor 
extension of standard ABC SMC code. Further, the computational overheads adaptive 
weighting generates are- in anything but trivial models- typically quite negligible rel¬ 
ative to the main expense of forward simulations of prior predictive distributions. We 
also note that the adaptive weighting idea has the potential to be integrated into other 
ABC SMC extensions, even though this integration requires careful study of potential 
computational and theoretical implications. 

The basic idea of adaptive weights links closely to adaptive importance sampling 
and direct posterior approximations based on mixtures of kernel forms. As noted in 
Section 2, the local smoothing for nonlinear regression adjustment in ABC of Bonassi 
et al. (2011) uses multivariate normal mixtures in related ways as “local” posterior ap¬ 
proximations in regions defined by the threshold setting. There the mixture modelling, 
used to define posterior emulators, is based on large-scale Bayesian nonparametric mod¬ 
els that can have many mixture components and so flexibly adapt to the shapes of local 
posterior contours. Incidentally, mixture fitting is computationally effective based on 
GPU parallelized code (Suchard et al., 2010; Cron and West, 2011). This connection 
suggests a more general adaptive weighting strategy that uses a joint kernel that is not 
of product form, with the potential to customize the weighting of sampled parameters 
further. That is, in the AW algorithm of Figure 2, the kernel Ks.t{9\9*) would be mod¬ 
ified to have shape characteristics that depend also on the locale in which the kernel 
location 9* sits. Building on the connections with multivariate normal mixtures sug¬ 
gests specific ways in which these modifications could be developed, and this is under 
investigation. At this point, however, it is unclear just how beneficial this will be in 
practice. Such extensions will require substantial additional computational overheads 
to identify and compute local kernel functions, which may more than offset the gains 
in efficiency the extensions can be expected to generate. The gains in acceptance rates 
already achieved by the very simple (to code and run) adaptive weighting method of 
this paper can already be very substantial, as our examples highlight. 
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